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Abstract 

Background: Isatis indigotica Fort, is one of the most commonly used traditional Chinese medicines. Its antiviral 
compound is a kind of lignan, which is formed with the action of dirigent proteins (DIR). DIR proteins are members 
of a large family of proteins which impart stereoselectivity on the phenoxy radical-coupling reaction, yielding 
optically active lignans from two molecules of f-coniferyl alcohol. They exist in almost every vascular plant. However, 
the DIR and DIR-like protein gene family in /. indigotica has not been analyzed in detail yet. This study focuses on 
discovery and analysis of this protein gene family in /. indigotica for the first time. 

Results: Analysis of transcription profiling database from /. indigotica revealed a family of 19 full-length unique DIR and 
DIR-like proteins. Sequence analysis found that /. indigotica DIR and DIR-like proteins (//DIR) were all-beta strand proteins, 
with a signal peptide at the /V-terminus. Phylogenetic analysis of the 19 proteins indicated that the //DIR genes cluster 
into three distinct subfamilies, DIR-a, DIR-b/d, and DIR-e, of a larger plant DIR and DIR-like gene family. Gene-specific 
primers were designed for 19 unique //DIRs and were used to evaluate patterns of constitutive expression in different 
organs. It showed that most //DIR genes were expressed comparatively higher in roots and flowers than stems and leaves. 

Conclusions: New DIR and DIR-like proteins were discovered from the transcription profiling database of / indigotica 
through bioinformatics methods for the first time. Sequence characteristics and transcript abundance of these new genes 
were analyzed. This study will provide basic data necessary for further studies. 

Keywords: Dirigent and dirigent-like proteins, isatis indigotica, Bioinformatics, Secondary structures. Tertiary structures, 
Phylogenetic analysis. Transcript abundance 



Background 

Isatis indigotica Fort, is one of the most commonly used 
plants in traditional Chinese medicine for its anti- 
inflammatory and antiviral activities [1]. Its leaves are 
called "Daqingye" {Folium Isatidis), which can be used 
for the treatment of high fever, epidemic parotitis, pha- 
ryngitis and erysipelas. The root of /. indigotica is the 
well-known Chinese medicine "Banlangen" {Radix Isati- 
dis), which is widely used for flu and infections of the 
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upper respiratory tract in China. During the epidemic 
period of severe acute respiratory syndromes (SARS) in 
2003, Banlangen demonstrated the potential prevention of 
SARS [2]. However, the antiviral compounds of /. indigo- 
tica were still unl<nown until Li [3] learned that lariciresi- 
nol isolated from this plant was useful for the treatment of 
influenza Al virus. 

Lariciresinol is a kind of lignan which has been widely 
studied and reported to possess a number of biological activ- 
ities, including antimicrobial, antioxidant, anti-inflammatory 
and anti-estrogenic properties, which may reduce the risk of 
cardiovascular diseases, as well as certain types of cancer 
[4-9]. The precursor of lariciresinol is pinoresinol, which 
comes from £-coniferyl alcohol by the action of dirigent pro- 
teins (DIR) [10]. 

Dirigent (Latin: dirigere, to guide or align) proteins are 
members of a large family which imparts stereoselectivity 
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on the phenoxy radical-coupling reaction. These proteins 
can capture £-coniferyl alcohol (only £-coniferyl alcohol, 
not /i-coumaryl or sinapyl alcohols which differ only in 
the degree of aromatic methoxylation [10]) derived free- 
radical intermediates and orientate these radicals in 
such a way as to enable 8-8' coupling with concomitant 
intramolecular cyclization to afford optically active (+)- 
or (-)-pinoresinol [11-14]. In the absence of DIR pro- 
teins, only non-specific radical-radical coupling occurs 
at the 8-8', 8-5) or 8-0-4' positions with the resulting 
formation of racemic lignan products [12-14]. 

DIR proteins exist in almost every vascular plant [15]. 
Ralph and coworkers [16] suggest that the DIR proteins 
are subdivided into five groups: the DIR-a, DIR-b, DIR-c, 
DIR-d and DIR-e subfamilies. With the increasing num- 
bers of DIR proteins, the DIR-b and DIR-d subfamilies 
are combined together with the appearance of the DIR-f 
and DIR-g subfamilies [17]. However, only members of 
DIR-a subfamily are being studied for their biochemical 
functions; the other proteins are referred to as DIR-like 
proteins. The DIR and DIR-like protein gene family in /. 
indigotica has not been analyzed in detail yet. Under the 
umbrella of a transcription profiling of /. indigotica [18], 
19 full-length liDIRs (the dirigent or dirigent-like protein 
genes of /. indigotica) are mined analytically through bio- 
informatics. Here we report an inventory and sequence 
analysis as well as the phylogenetic relationships of the 
/i'DIR gene family. A detailed quantitative real-time PCR 
expression analysis in constitutive /. indigotica tissues is 
described for 19 //DIRs. Finally, we provide a transcrip- 
tome analysis of //DIRs, which is based on data treated 
with MeJA at different time points. 

Results 

Discovery of liDIRs from the /. indigotica transcription 
profiling database 

Using TBLASTN and BLASTN (Basic Local Alignment 
Search Tool 2.2.26) against the /. indigotica transcrip- 
tion profiling database with released DIR and DIR-like pro- 
tein sequences, we obtained 19 putative liDIR sequences 
(Additional file 1). The best hit homology genes of these 
19 sequences were summarized in Additional fOe 2. The 
number and subfamily designation of the liDIR genes were 
based on the topology of the 19 liDIRs with other 178 
DIRs according to Ralph [17] and Arasan [19]. Typical 
dirigent domains were found in these 19 liDIR protein 
sequences though simple modular architecture research 
tool (SMART, http://smart.embl-heidelberg.de/) [20] 
(Additional file 3). 

Sequences analysis 

The length of the predicted open reading frames (ORFs) 
for the 19 cDNAs ranged from 183 aa (/iDIRl) to 414 aa 
(//DIR19). The 19 //DIRs had predicted molecular masses 



range from circa 20.17 (//DIR8) to 39.94 (//DIR19) kDa 
and predicted pi values range from 4.79 (/iDIR16) to 9.85 
(//DIR8) (Additional fUe 3). 

Using the Target? 1.1 Server (http://www.cbs.dtu.dk/ 
services/TargetP/) [21] and the WoLF PSORT (http:// 
www.genscript.com/psort/wolf_psort.html) [22] subcellular 
localization software, it was predicted that most of the 19 
/i'DIRs were targeted to the secretory pathway, either 
through the default pathway for extracellular release, or for 
possible final localization in the vacuolar, chloroplast and 
cytoplasmic locations. The signal peptide prediction showed 
that most of the //DIRs had a 20-30 aa length signal peptide 
at the TV-terminus except //DIR13, //DIR15, and //DIR18. All 
//DIRs except //DIR12/13/14/15/16/17 were found to con- 
tain N-glycosylation sites (Asn) which were a feature of se- 
creted proteins using NetNGlyc 1.0 server (http://www.cbs. 
dtu.dlc/services/NetNGlyc/) [23] (Additional file 3). The 
SMART results showed that //DIR3/4/12/13/14/15/17/19 
had transmembrane region. The molecular formula was 
calculated through ProtParam (http://web.expasy.org/ 
protparam/) [24]. It found out that, the gene with the 
most sulfur elements was liDIRll (12 sulfur elements), 
while liDIRlS and liDIRYJ only had three sulfur ele- 
ments, respectively (Additional file 3). 

Pairwise sequence similarities among predicted amino 
acids of thel9 //DIRs ranged from a low of 14.4% identity 
(/iDIR4 vs. //DIR13, //DIR14, //DIR15, respectively) to a 
high of 98.1% (//DIR14 vs. //DIR15) (Table 1). //DIR14 and 
//DIR15 were an example of closely related proteins shar- 
ing amino acid identity greater than 98% that may repre- 
sent within-species alleles. 

Secondary structures of //DIRs 

Additional file 4 presented the secondary structures poly- 
grams of the 19 //DIRs. They were predicted by NetSurfP 
(http://www.cbs.dtu.dk/services/NetSurfP/) [25]. The 19 
//DIR proteins could be divided into several groups accord- 
ing to their secondary structures. The differences among all 
the //DIRs were existing in the residues before the first |3- 
strand. According to this shape, //DIR16/17/18/19 were far 
away fi-om //DIRl to //DIR15. In the first 15 //DIRs, 
//DIR13/14/15 were diSferent from others at the shape of re- 
gion between the first |3-strand and the second (B-strand. 
//DIR12 was different fi-om others at the last shape of p- 
strand. //DIRl/2/3/4/were different from other //DIRs be- 
cause they had a smooth Coil curve before the first |3-strand. 

To confirm the forecast accuracy of NetSurfP, predic- 
tions of the secondary structures were also carried out 
on PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) and 
Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page. 
cgi?id=index) [26]. The position of the (3-strands was in 
overall agreement with the predictions determined by 
NetSurfP (data not shown). 
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Table 1 Sequence relatedness of //DIRs 
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//DIR7(g) 


25.6 
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22.5 
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//DIR8(h) 
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66.3 
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21.3 
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17.5 


39.4 
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45.0 
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19.4 


19.4 


18.8 


41.3 


41.9 


45.6 


44.4 
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//DIRll(k) 


22.5 


26.3 


22.5 


20.6 


44.4 


45.6 


43.8 


43.8 


41.9 


46.3 
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//DIR12(I) 


18.1 
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20.0 
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17.5 
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//DIR13{m) 


17.5 
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23.1 
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//■DIR14{n) 
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19.4 
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14.4 


23.8 


23.8 


23.8 


24.4 


21.9 
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//DIR15(o) 
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15.6 
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23.8 


23.8 


23.8 
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21.3 
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17.5 97.5 98.1 - 


//DIR16(p) 


20.0 


20.6 


17.5 


16.9 


24.4 


24.4 


25.6 


24.4 


23.8 


23.1 


26.9 


20.0 41.3 41.3 41.9 - 


//DIR17(q) 


18.1 


18.8 


15.6 


15.0 


21.3 


23.8 


23.8 


22.5 


22.5 


21.3 


25.6 


18.8 42.5 42.5 43.1 90.0 - 


//DIR18(r) 


15.6 


18.8 


16.9 


16.3 


25.0 


23.1 


26.9 


25.6 


24.4 


25.6 


24.4 


16.9 50.6 50.6 50.6 55.0 56.3 - 


//DIR19(s) 


15.6 


18.8 


15.6 


15.0 


24.4 


22.5 


26.3 
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23.1 


23.1 


25.6 


16.3 51.9 51.9 51.9 544 55.6 91.9 - 


Results from pairwise amino acid sequence comparisons, using complete open reading frames. 


show as 


percent identity among members of the DIR-a, DIR-b/d 



and DIR-e subfamilies. Comparisons within the subfamily are printed as normal, whereas comparisons between proteins of the subfamilies DIR-a and DIR-b/d, 
DIR-a and DIR-e, and DIR-b/d and DIR-e are highlighted in italic, underline, and bold, respectively. 



Tertiary structures and homologs of //DIRs 

Figure 1 showed predicted three-dimensional structures 
of 19 //DIR proteins. Structures of the 19 proteins were 
modeled using the server: http://www.sbg.bio.ic.ac.uk/ 
phyre2/html/page.cgi?id=index [26]. For all of thel9 
queried sequences, the same three top-scoring proteins 
were found, all of which belong to the allene oxide 
cyclase-like protein (AOC) family. AOC barrel-like pro- 
tein d2brjal, which shared only 17-26% sequence iden- 
tity among the //DIRs, was predicted as a DIR homolog 
with about 98% probability, followed by two hypothetical 
proteins with similar probabilities (dlzvcal and c4h69A, 
Table 2). Among the highest confidence level predicted 
by Phyre2, //DIR14 and //DIR18 showed the highest con- 
fidence of 98.3% respectively with the template d2brjal. 

Phylogenetic analysis of the //DIRs 

To obtain clues about the evolutionary relationships and 
the topological structures of the //DIRs, multiple sequence 
alignments of amino acid sequences of the 19 full-length 
cDNAs were used to buOd a Neighbor- Joining (NJ) tree 
with 1000 bootstrap reconstruction and completed deletion 
gaps/missing data treatment (Figure 2). The 19 //DIRs were 
clearly separated into three distinct groups based on se- 
quence relatedness. The amino acid sequences of //DIRl/2/ 
3/4 were clustered into Group 1, whOe //DIR5/6/7/8/9/10/ 



11 were clustered into the second group. These two 
groups were in accordance with the secondary structures. 
//DIR13/14/15/16/17/18/19 were clustered into another 
group. //DIR 12 was left behind between Group 1 and 
other //DIRs. 

To test the reliability of the NJ tree, a Maximum Likeli- 
hood (ML) analysis was also carried out to generate a 
phylogenetic tree using default parameters and 1000 boot- 
strap reconstruction as well (Additional file 5, -In = 
3970.33, model: WAG + F). Both of the two trees had simi- 
lar topological structures with three clusters, which indi- 
cated that the two methods were in good agreement. 

To better understand DIR and DIR-like protein se- 
quences divergences and similarities among / indigotica 
and other plants, a provisional molecular phylogenetic tree 
was constructed using multiple sequence alignment from 
various plant species. These gene sequences were as fol- 
lows: 29 genes from Bmssica mpa, 25 genes from Arabi- 
dopsis thaliana, 54 genes from Oryza sativa, 35 genes from 
spruce, 9 genes from Thuja plicata, and an additional 27 
DIRs identified from a variety of species, including pea, cot- 
ton, corn, sesame, etc. [17,19]. In this tree, different subfam- 
ilies according to Ralph [17] were colored in different 
colors. However, only DIR-a, DIR-c and DIR-f subfamilies 
were clustered separately. DIR-e and DIR-b/d subfamilies 
were mixed with genes from DIR-g subfamily. These DIR-g 
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Figure 1 Cartoon-style model of //DIRs derived from prediction. 
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//DIR8 




//DIRl 2 




//DIRl 6 




subfamily genes were all from B. rapa. The phylogenetic 
tree indicated that //DIRs cluster into three groups, DIR-a, 
DIR-b/d and DIR-e (Figure 3). /iDIRl/2/3/4 grouped into 
subfamily DIR-a, along with 5 A. thaliana genes and 6 B. 
rapa genes. //DIR5/6/7/8/9/10/11 grouped into subfamily 
DIR-b/d, along with 14 A. thaliana genes and 16 B. rapa 
genes. /iDIR13/14/15/16/17/18/19 grouped into subfamily 
DIR-e, along with 6 A. thaliana genes and mixed with 4 
5rDIRs from DIR-g subfamily. //DIR 12 was outside the 
subfamily DIR-e. We designated it to subfamily DIR-e. 

Sequence comparison 

The //DIR sequences were analyzed to address if any puta- 
tive functions could be inferred. The topology analyses of 
the //DIRs showed that they contribute to DIR-a, DIR-b/d, 



and DIR-e subfamilies. Recent studies only focused on the 
function of the DIR-a subfamily, classifying the other DIRs 
from the other subfamilies to be the DIR-like proteins. Ac- 
cording to Pickel [11,27], AtDIRS and AtDIR6 from DIR-a 
subfamily of A. thaliana were different from those DIRs 
found earlier, such as f/DIRl and TpDlKJ [28]. The first 
DIR from Forsythia suspensa was found to guide £-coni- 
feryl alcohol to form (+)-pinoresinol [12], and many other 
DIRs had the same function [29]. However, in the pres- 
ence of AtDWjS, the final product of £-coniferyl alcohol 
was the enantiomer (-)-pinoresinol. From the topology 
tree of DIRs from different species (Figure 3), //DIR2/3/4 
were adjacent to AiDIR6, and //DIRl was next to AtDlR5. 
These observations suggested that //DIRl/2/3/4 might 
have similar functions with j4tDIR6 and AfDIR5. 
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Table 2 The probability and identity of homologous relationship of //DIRs 

d2brja1 c4h69A dizvcal 
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24 
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24 
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24 
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22 
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Sequence comparisons between AtDlR6 and /jDIR2/3/4 
as well as AtDlR5, //DIRl, HDIRl and TpDlR7 were per- 
formed by clustalX 2.1 [30]. The results showed that //DIR2 
had 93.05% identity with AiDlR6, while /iDIR3 and /iDIR4 
had only 68.45% and 66.31% identity with AtDlR6. It sug- 
gested that the relative among /iDIR3, //DIR4 and AtDlR6 
might far away from that between /iDIR2 and AtDlR6. 
//DIRl had 90.66% identity with AtDlRS. Residues conser- 
vation was shown in Figure 4. 

To examine sequence features of these //DIR sequences, 
sequence comparison between 19 //DIRs and 29 /JrDIRs 
were carried out as well. The 19 //DIRs showed five well 
conserved motifs in their amino acid sequences like 29 
BrDIRs (Figure 5). 

Transcript abundance analysis of //DIRs In different 
organs 

Since the transcript abundance of a gene was often corre- 
lated with its function, the relative constitutive abundance 
of the 19 liDIRs were quantified in total RNA isolated 
from roots, stems, leaves and flowers through real-time 
PGR using gene-specific primers (Additional file 6). The 
organ specific expression of each liDIR gene was normal- 
ized to actin as control and compared with root as refer- 
ence using 2'"" ' method. The transcript abundance level 
was showed in Figure 6. 
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0.1 

Figure 2 Neighbor-Joining (NJ) phylogenetic trees of 19 //DIRs. 

The values on the branches are bootstrap proportions, which indicated 
the percentage values for obtaining this particular branching in 1000 
repetitions of the analysis. The lengths of branches are proportional to 
evolutionary distances between species. 
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Figure 3 Phylogenetic tree of plant DIR and DIR-like protein sequences. Amino acids of 197 dirigent or dirigent-like (DIR) proteins are 
analyzed by IVlaximum Lil<elihood (ML) using MEGA 5.05 (-In = 2880.02, model: WAG + F). Subfamilies DIR-a, DIR-b/d, DIR c, DIR-e, DIR-f and DIR-g 
are indicated by pink, yellow, green, purple, skyblue and pink-purple shading respectively. The /\fDIRs are colored in red and BrDIRs are colored in 
darkgreen. //DIRs are marked as normal. DIR nomenclature is as follows: Ah, Arachis hypogaea; As, Agrostis stolonifera; At, Arabidopsis thaiiana; Br, 
Brassica rapa; Fi, Forsythia intermedia; Gb, Gossypium barbadense; Hv, Hordeum vuigare; 11, isatis indigotica; Nb, Nicotiana benthamiana; Os, Or/za 
sativa; P, Picea glauca, Picea sitchensis or P. glauca x engelmannii; Pp, Podophyllum peltatum; Ps, Pisum sativum; Sb, Sorghum bicolor; Si, Sesamum 
indicum; So, Saccharum officinarum; Ta, Triticum aestivum; Tan, Tamarix androssowli; Th, Tsuga heterophylla; Tp, Thuja plicata and Zm, Zea mays. 



Based on RT-PCR analysis, 5 liDIRs {IiDIR2/5/lQ/l5/ 
18) displayed the highest transcript abundance in all 
tissues. Another 5 liDIRs (/jD/7?3/4/11/13/17) showed 
higher transcript abundance in roots and flowers than 
in stems and leaves. IiDIR6 was higher in leaves and 
liDIRl was higher in flowers. /iX)/7?l/12/13/16/19 were 
hardly expressed in leaves. The remaining two liDIRs 
(liDIRS and IiDIR9) were nearly not detected in any 
tissue (Additional file 7). 



Compared with the gene transcript abundance in 
roots, liDIR? was more than 500 fold higher in flowers. 
All of these genes were lowly expressed in leaves than in 
other tissues except IiDIR6. Most liDIRs have compara- 
tively higher transcript abundance in roots and flowers than 
in stems and leaves, such as /iD/i?2/10/ll/14/15/17/18. 
The transcript abundance of liDIR^ and liDIRS were higher 
in stems and flowers than in roots and leaves. liDIRVl and 
liDIRlS were expressed more in stems (Figure 6). 
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Figure 4 Sequence comparison between DIRs from Forsythia intermedia, Thuja plicata, Arabidopsis thaliana and Isatis indigotica. 

Residues conserved in all of the sequences are indicated in blacl<. Sequence conservation between A. thaliana and /. indigotica is highliglited in 
blue. Conservation between T. plicata and F. intermedia is highlighted in green. Conservation between AtD\R6, //DIR2, //DIR3 and //DIR4 is 
highlighted in red. Conservation between /IfDIRS and //DIRl is highlighted in yellow. Conservation between AtD\R6 and /;DIR2 is highlighted in 
gray. Predicted W-terminus signal peptides are shown in italics with underline. 



Transcript abundance analysis after treatment with MeJA 

MeJA was used to induce the gene transcript abundance at 
hairy roots of / indigotica for different times. The liDIRs 
transcript abundance was showed in Figure 7. The result of 
liDIRlS was not tested during this experiment. IiDIRl/2/4:/ 
5/11 were down regulated at 1, 3, 6, 12 and 24 h compared 
with 0 h. /iD/7?8/9/10/16 were up regulated at 1, 3, 6, 12 
and 24 h compared with 0 h. The left genes were up or 
down regulate at different times. liDIRS and IiDIR9 were 
nearly not expressed in roots, stems, leaves or flowers, but 
both of them were up regulated after treatment with MeJA. 
The regulation was lasting till the end of the experiment. 
This indicated that liDIRS and IiDIR9 may take part in 
defense response. IiDIR6/7 /12/16/19 were lowly expressed 



in roots. After treatment with MeJA, they were up regu- 
lated at different times and last for a period of time. 

Discussion 

DIR and DIR-like proteins belong to a multigene family. 
They are found in all of the major terrestrial plants [15] 
and are considered to have developed an important en- 
zymatic reaction for the production of lignin and lignan 
during the time of the adaptation of aquatic plants to the 
terrestrial environment [31]. The gene number of DIR pro- 
teins in plants is different from each other. There are 25 
DIRs in A. thaliana, 29 DIRs in B. rapa and 54 DIRs in 
rice [17,19]. In this study, 19 DIRs are discovered from / 
indigotica by bioinformatics methods for the first time. 
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Figure 5 Conserved five characteristic motifs (l-V) of dirigent proteins in //DIR and SrDIR protein sequences. 



From the prediction of the ORFs, combined with the 
topology structures, it is found that the DIR amino acid 
sequences are divergent, ranging from the shortest protein 
of 183 aa (ADIRl) to the longest of 414 aa (/iDIR19). The 
length of the DIR-e subfamily members is longer than 
DIRs from the other subfamilies; they range from 224 aa 
(/jDIRlS) to 414 aa (/iDIR19). //DIRs in the DIR-a subfam- 
ily range from 183 aa (//DIRl) to 188 aa (/iDIR2, //DIR3 
and /iDIR4). The length of the //DIRs in the DIR-b/d sub- 
family is similar with DIR-a subfamily, ranging froml86 aa 
(//DIR5 and //DIR6) to 191 aa (//DIR12). 

Sequence analysis of //DIRs, using currently avaOable 
web-based bioinformatics tools (http://www.cbs.dtu.dk), in- 
dicated that most //DIRs have cleavable N-terminal signal 
peptides varying from 20 aa to 30 aa (Additional file 4) sug- 
gesting an extracellular localization.This means that these 
//DIRs are likely to be secreted proteins. N-glycosylation 
sites (Asn) are a feature of secreted proteins and have been 
found in HDIRl, the first and best characterized DIR 



protein [32]. Thirteen of these 19 //DIRS have more than 
one Asn sites, also indicating that most //DIRs were likely 
to be secreted protein. 

The secondary and tertiary structures show that the 
//DIRs are all (3-strand proteins. All the //DIRs have poten- 
tial P-strands, separated by regions of coils. The p-strands 
shape the //DIR proteins like a barrel. The only a-helbc ex- 
ists in the A/-terminus and appears to be the signal pep- 
tide. Both NetSurfP and PSIPRED prediction for the 
secondary structure of //DIRs show that DIRs are all-beta 
strand proteins. This is in agreement with previous studies 
[11]. Halls et al. [14] find out that the (+)-pinoresinol- 
forming dirigent protein from F. intermedia has been con- 
firmed by circular dichroism analysis and is primarily 
composed of (S-sheet and loop structures. 

Hitherto, DIRs have not been crystallized [11], and X-ray 
or NMR structures are remain unavailable. Homologous 
proteins with known structures can serve as templates for 
modeling of DIRs. Therefore Phyre2 is used with intensive 
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modeling to perform the prediction of the tertiary struc- 
tures of /i'DIRs, as well as searching for the homologous 
proteins of /i'DIRs. It results that all the /iDIRs are barrel- 
like proteins. 

The topological tree (Figure 3) showed that /iDIRs are di- 
vided into DIR-a, DIR-b/d, and DIR-e subfamOies. PDIR17 
is separated from DIR-b/d subfamily and clustered to DIR- 
g subfamOy. OsDIRl/2/3/4/9/49 from DIR-g subfamily are 
clustered with DIR-b/d subfamily. This is in agreement with 
Ralph's studies [16,17]. Ralph found subsequently that sev- 
eral sequences from the previous distinct DIR-d cluster 
were merged with the former DIR-b subfamily to form the 
new DIR-b/d subfamily and left the rice DIR-like proteins 
from the former DIR-d subfamily group to be a separate 
subfamily, DIR-g [17]. In this study, members of the DIR-b/ 
d and DIR-g subfamOies are recombined again. This might 
be the result of extended DIR genes. 

It should be noted that the transcript abundance of most 
liDIRs are comparatively higher in roots and flowers than 
in stems and leaves. This is in accordance with Arasan's 
finding in B. rapa DIRs [19]. It is well known that DIR 
genes are participate in lignin biosynthesis. So the liDIRs 
transcript abundance in an organ specific manner in this 
study suggests that liDIRs take possible roles in specific 
organs through lignin formation and participate in /. 



indigotica's developmental processes. These organs also 
share characteristics that make them particularly prone to 
other stresses and protect themselves against stress attack. 
To find out liDIRs transcript abundance in roots at the 
stress of MeJA, differential expression of IIDIR genes is 
mined from /. indigotica expression profiling database 
[18]. It shows that IiDIR9, and IiDIR9 may tal<e part in 
defense response, because they are nearly not expressed in 
roots, stems, leaves or flowers, but both of them are up 
regulated after treatment with MeJA and the regulation 
lasting till the end of the experiment. 

Conclusions 

In this study, 19 DIRs were distinguished from the /. indigo- 
tica transcription profiling database for the first time. Se- 
quence characters and transcript abundance of these 19 
full-length //DIRs were analyzed, respectively. The results 
showed that //DIRl and /«DIR2 are similar with AtYiW5 
and AtDIR6. They might have the ability to produce (-)-lig- 
nans. The organ specific expression results in higher ex- 
pression in roots and flowers than in stems and leaves 
indicated that roots and flowers may synthesis more lignin 
during plant development. /iDIR6/7/8/9/12/16/19 were up 
regulated after treatment with MeJA, suggesting that they 
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Figure 7 Heat map of //DIR transcript expression obtained after 
treatment with MeJA at hairy roots. A color bar indicates fold-change 
expression differences on a natural log scale (treatment/control). Hairy 
roots of /. indigotica are treated with MeJA for 0, 1 , 3, 6, 1 2 and 24 h. 0 h 
is designed as control. 




may take part in defense response. All this would provide 
basic data necessary for further studies. 



liDIR sequences with e value le-5. After removing se- 
quences with alignment length less than 500 bp (the gene 
length of DIR was longer than 550 bp), there were 314 can- 
didate liDIR sequences. Since using the search word "diri- 
gent" had not return dirigent proteins exclusively, all of the 
314 liDIR sequences were BLAST with NCBI using default 
parameters to remove the none dirigent sequences. The 
NCBI BLAST result was used to search the /. indigotica 
database again to mine the omission sequences of //DIRs. 

To verify the reliability of the candidate liDIRs, simple 
modular architecture research tool (SMART, http:// 
smart.embl-heidelberg.de/) [20] was used to find the 
dirigent domain in these liDIRs amino acid sequences 
respectively using default parameters. 

Sequence analysis 

All the DIR cDNA sequences mining from the database 
were analyzed for their basic characteristics. NCBI Open 
Reading Frame Finder (ORF Finder) (http://www.ncbi.nlm. 
nih.gov/gorf/gor£html) and Vector NTI Advance (TM) 
11.0 were used to identify the whole ORF of each sequence. 
Predictions for MW and pi were performed using the entire 
ORFs on Vector NTI Advance (TM) 11.0. The Target? 
1.1 program accessible at http://www.cbs.dtu.dk/services/ 
TargetP/ [21] and the WoLF PSORT server (http://www. 
genscript.com/psort/wolf_psort.html) [22] were used to pre- 
dict presence of N-terminal signal peptides and localization 
of the mature protein. The molecular formula of each /i'DIR 
protein was predicted by ProtParam (http://web.expasy.org/ 
protparam/). Multiple protein sequences alignments of the 
/iDIRs and 5rDIRs were made with ClustalX 2.1. All the 
analysis was carried out using default settings. 



Methods 

Discovery of liDIRs from the transcription profiling 
database 

After 454 pyrosequencing of /. indigotica transcription 
profiling, a paired-end Solexa sequencing was carried out 
to maximize the sequence diversity. All of the data was as- 
sembled and provided a new database for the discovery of 
liDIRs. The database was consisted of 65,196 unigenes at 
an average length of 1,503 bp. The largest unigene was 
20,383 bp long while the length of the smallest unigene 
was 351 bp. Among all the unigenes, 30,131 genes was an- 
notated [18]. 

In order to obtain all of the sequences of the DIRs in 
/. indigotica database, 1715 protein sequences, 1047 nucleo- 
tide sequences and 193 EST records of DIRs of other 
plants from the National Center for Biotechnology Infor- 
mation (http://www.ncbi.nlm.nih.gov/) were downloaded 
using the search word "dirigent". All of the 2955 sequences 
were used as queries to search the /. indigotica transcription 
profiling database through basic local alignment search tool 
(TBLASTN or BLASTN) to determine all of the candidate 



Prediction of the secondary and tertiary structures of //DIRs 

Secondary structure predictions of the sequences were per- 
formed by NetSurfP (http://www.cbs.dtu.dlc/services/Net- 
SurfP/) [25], PSIPRED (http://bioinfcs.ucl.ac.uk/psipred/) 
and Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page. 
cgi?id=index) [26] using default parameters. The prediction 
of the tertiary structures was carried out on Phyre2 with an 
intensive modeling. The same program was also used to 
search for homologs of //DIRs. The amino acid sequences 
of //DIRs were used as the target sequences. 

Computation of palrwise distances and 
phylogenetic analysis 

Sequence similarities among the 19 full-length amino 
acids were computed by MEGA 5.05 with p-distance. All 
of the phylogenetic trees were built using MEGA 5.05 with 
1000 bootstrap replicates. CONSENSE, also from MEGA 
5.05, was used to create a consensus tree. Bootstrap values 
above 50% were added to the trees generated from the ori- 
ginal data set. The ML tree of 197 DIRs was built using 
iTOL (http://itol.embl.de/) [33]. 



Li et al. BMC Genomics 2014, 15:388 
http://www.biomedcentral.com/1471-2164/15/388 



Page 11 of 13 



Plant materials 

The plant of /. indigotica was grown in the botanical gar- 
den of Second Military Medical University, Shanghai, 
China, and identified by Professor Hanming Zhang. Fresh 
roots, stems, leaves and flowers of this plant were har- 
vested, frozen immediately in liquid nitrogen, and stored 
at -80°C for RNA isolation. 



Preparation of RNA and cDNA 

Total RNA of /. indigotica was extracted from stored roots, 
stems, leaves and flowers respectively using the TIANGEN 
TRNzol-A+ Reagent for total RNA Isolation Kit (TIAN- 
GEN BIOTECH (BEIJING) CO., LTD, Beijing, China). The 
integrity of the RNA was visualized on ethidium bromide 
stained agarose gels, and the purity of the RNA was deter- 
mined by UV spectrometry. The first-strand cDNA was re- 
verse transcribed following the TransScript First-Strand 
cDNA Synthesis SuperMbc' User Manual (TransGen Bio- 
tech, Beijing, China). 

Real-time PCR 

Real-time PCR was conducted on an ABI 7500 PCR sys- 
tem (Applied Biosystems, USA) using Fast SYBR Green 
Master Mix (Applied Biosystems) according to the manu- 
facturer's instructions. Reaction mixtures contained 1.5 |iL 
of cDNA as template, 0.5 pmol of each primer and 10 \iL 
of 2x Fast SYBR Green Master Mix in a final volume of 
20 |iL. Gene-specific primers (Additional file 6) for each 
liDIR were designed through Primer Express 3.0 (Applied 
Biosystems). Specificity of each primer pair was checked 
by BLASTN searches against the /. indigotica RNA se- 
quences to confirm designed primers were dirigent spe- 
cific. Primer specificity (single product of expected length) 
was confirmed by analysis on a 0.8% agarose gel and by 
melting curve analysis. Gene actin was served as a quantifi- 
cation control. It was the best hit gene found in /. indigotica 
transcription profiling database through BLAST using 22 
A. thaliana's actin genes [NM_114519.2, NM_179953.2, 
NM_125328.3, NM_121018.3, NM_1 15235.3, NM_1127 
64.3, NM_001085300.1, NM_001036427.2, NM_180280.1, 
NM_103814.3, NM_129772.1, NM_180032.1, AY114679.1, 
AY062702.1, AY120779.1, AK230311.1, U39480.1, U39 
449.1, U42007.1, U41998.1, AF308778.1 and NM_ 
112046.3]. Primers for /. indigotica actin were also listed in 
Additional file 6. 

The program for all real-time PCR reactions was: hold 
at 95°C for 20 s; 40 cycles of 3 s at 95°C and 30 s at 60°C. 
Data were analyzed using ABI 7500 sds Real-Time PCR 
system software (Applied Biosystems). All PCR reactions 
consisted of 3 technical replicates. Transcript abundance 
of each liDIR gene was normalized to actin as control and 
compared with root as reference using 2'"''*"'' method. 



Transcript abundance of IIDIRs in /. indigotica hairy roots 
treated with MeJA 

To get insight into the liDIRs transcript abundance in- 
duced with MeJA, the lUumina RNA-Seq data provide by 
Chen [18] was utilized. The RNA-Seq expression profile 
data were generated using the lUumina HiSeq' " 2000 plat- 
form, and included the hairy roots of /. indigotica treated 
with MeJA at 0, 1, 3, 6, 12 and 24 h. 0 h was used as control 
to normalize the expression level of other times. Finally, 
the heat map was constructed using the log2 transformed 
and normalized expression level data in MultiExperiment 
Viewer (MeV) [34]. 

Availability of supporting data 

Sequence data from this article can be found in the Gen- 
Bank data libraries under accession numbers: A/zDIRl, 
AAZ20288.1; AsDIRl, AAY41607.1; AtUmi, ABR46205.1; 
AfDIRlO, AAU90058.1; AfDlVill, AAQ65106.1; AfDIR12, 
AEE82982.1; AfDmU, AAP88352.1; AfDIR14, AEE82984.1; 
AfOmiS, AEE86966.1; AtXmi6, AAP37695.1; AfOmil, 
CAB67637.1; AfDIR18, AEE83298.1; AfDIR19, AA039937.1; 
AfDlVa, AAP37801.1; AfDIR20, AAU15178.1; AfDWll, 
AEE34435.1; AfDlKn, AAU15153.1; ^fDIR23, AAT71988.1; 
AfDIR24, AEE79355.1; AfDIR25, AAP49521.1; AfDlV3, 
AED95765.1; AiDIR5, AAQ65109.1; AiDIR6, AEE84795.1; 
Atom, AAQ89609.1; AtDIRS, AEE75389.1; AfDlW, 
AAR20779.1; ^iDIRD4, AEC07124.1; fiDIRl, AAF25357.1; 
fiDIR2, AAF25358.1; G^DIRl, AAS73001.2; G^DIR2, 
AAY44415.1; HvDWl, AAA87042.1; WvDIR2, AAA87041.1; 
HvDim, AAB72098.1; AftDIRl, BAF02555.1; OsDIRl, 
BAF20623.1; OsDIRlO, BAF12227.1; OsDIRll, BAF22309.1; 
OsDIR12, BAF22310.1; O5DIRI3, BAF22318.2; OsDIR14, 
BAC19943.1; OsDIR15, BAF22323.1; O5DIRI6, BAC16397.1; 
OsDIR17, AAM74352.1; OsDIR18, BAD25846.1; OsDIR19, 
BAF13568.1; O5DIR2, BAF20624.1; OsDIR20, AA017346.1; 
OsDIR21, BAB64642.1; 05DIR22, BAD52647.1; OsDIR23, 
BAF26452.2; 05DIR24, BAD53304.1; OsDIR25, AAM74358.1; 
05DIR26, AAM74346.1; OsDIR27, BAD03849.1; OsDIR28, 
BAD03720.1; 05DIR29, BAD03711.1; O5DIR3, BAF20622.1; 
O5DIR30, BAD03854.1; O5DIR3I, BAF29307.1; OsDIR32, 
BAF27737.1; 05DIR33, BAF27733.1; 05DIR34, AAX96293.1; 
O5DIR35, BAF27735.1; OsDIR36, BAF27734.1; OsDIR37, 
BAF29386.1; OsDIR38, BAF29514.1; OsDIR39, BAF29458.2; 
O5DIR4, BAF23585.1; OsDIR40, BAF29387.1; OsDIR41, 
BAF29454.1; OsDIR42, ABA94701.1; OsDIR43, BAH95407.1; 
O5DIR44, BAB89759.1; 05DIR46, BAH92863.1; 05DIR47, 
BAF27863.1; OsDIR48, BAF27866.1; OsDIR49, BAF27867.1; 
O5DIR5, BAF26451.1; OsDIR50, BAF23524.2; OsDIR51, 
BAD89460.1; 05DIR52, AAX96290.1; 05DIR53, ABA93522.1; 
OsDIR54, AAX96314.1; OsDIR6, BAF22196.1; O5DIR7, 
BAF22195.2; OsDIRS, BAB89617.1; OsDIR9, BAF20620.1; 
PDIRl, ABD52112.1; PDIRIO, ABD52121.1; PDIRll, 
ABD52122.1; PDIR12, ABD52123.1; PDIR13, ABD52124.1; 
/'DIR14, ABD52125.1; PDIR15, ABD52126.1; PDIR16, 
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ABD52127.1; PDIR17, ABD52128.1; PDIRIS, ABD52129.1; 
PDIR19, ABD52130.1; PDIR2, ABD52113.1; PDIR20, 
ABR27716.1; PDIR21, ABR27717.1; PDIR22, ABR27718.1; 
PDIR23, ABR27719.1; /'Dffi24, ABR27720.1; PDIR25, 
ABR27721.1; PDIR26, ABR27722.1; PDIR27, ABR27723.1; 
PDIR28, ABR27724.1; PDIR29, ABR27725.1; PDIR3, 
ABD52114.1; PDIR30, ABR27726.1; PDIR31, ABR27727.1; 
PDIR32, ABR27728.1; PDffi33, ABR27729.1; PDIR34, 
ABR27730.1; PDIR35, ABR27731.1; PDIR4, ABD52115.1; 
PDIR5, ABD52116.1; PDIR6, ABD52117.1; PDIR7, 
ABD52118.1; PDIR8, ABD52119.1; PDIR9, ABD52120.1; 
i^DIRl, AAK38666.1; PsDIRl, AAD25355.1; ftDIR2, 
AAB18669.1; 5^DIR1, AAM94289.1; S^DIR2, ABI24164.1; 
&DIR1, AAT11124.1; SoDIRl, AAR00251.1; 5oDIR2, 
CAF25234.1; SoDffi3, AAV50047.1; TflDIRl, AAC49284.1; 
TaDlR2, AAM46813.1; raDIR3, BAA32786.3; rflDIR4, 
AAR20919.1; TanDlRl, ABE73781.1; ThDlRl, AAF25367.1; 
ThDlRl, AAF25368.1; TpDlR, AAF25364.1; TpDlRl, 
AAF25359.1; TpDlR2, AAF25360.1; r^DIR3, AAF25361.1; 
TpDm, AAF25362.1; TpDlR5, AAF25363.1; TpDlR7, 
AAF25365.1; r^DIR8, AAF25366.1; TpDlR9, AAL92120.1; 
and ZwDIRl, AAF71261.2. 
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